[SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. by beliefer · Pull Request #24671 · apache/spark

beliefer · 2019-05-22T03:11:30Z

What changes were proposed in this pull request?

I found the docs of spark.driver.memoryOverhead and spark.executor.memoryOverhead exists a little ambiguity.
For example, the origin docs of spark.driver.memoryOverhead start with The amount of off-heap memory to be allocated per driver in cluster mode.
But MemoryManager also managed a memory area named off-heap used to allocate memory in tungsten mode.
So I think the description of spark.driver.memoryOverhead always make confused.

spark.executor.memoryOverhead has the same confused with spark.driver.memoryOverhead.

How was this patch tested?

Exists UT.

SparkQA · 2019-05-22T06:03:47Z

Test build #105660 has finished for PR 24671 at commit ef6c505.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

JoshRosen

I think the proposed new description is inaccurate because interned strings and native overheads (e.g. Netty direct buffers) aren't allocated outside the executor process, per se (since off-heap memory is still in the JVM's own process address space).

Rather, I think the distinction is that these non-heap allocations don't count towards the JVM's total heap size limit, a factor which needs to be taken into account for container memory sizing: if you request a Spark executor with, say, a 4 gigabyte heap, then the actual peak memory usage of the process (from the OS's perspective) is going to be more than 4 gigabytes due to these types of off-heap allocations.

If we want to avoid container OOM kills (by YARN or Kubernetes) then we need to account for this extra overhead somewhere, hence these *memoryOverhead "fudge-factors": setting the memory overhead causes us to request a container whose total memory size is greater than the heap size.

That said, increasing the memory overhead does also result in additional memory headroom that can be used by non-driver/executor processes (like overhead from other processes in the container). Maybe we could state this explicitly, e.g. something like "... non-heap memory, including off-heap memory (e.g ....) and memory used by other non-driver / executor processes running in the same container".

You're correct that the relationship between these configurations and the Tungsten off-heap configuration is a bit confusing. To address that confusion, I'd prefer to expand the documentation to explicitly mention these *memoryOverhead configurations in the documentation for the Tungsten off-heap setting: I think that doc should recommend raising the memoryOverhead when setting the Tungsten config.

It might also help to more explicitly clarify that these settings only make sense in a containerized / resource limited deployment mode (e.g. not in standalone mode).

In summary, I think there's definitely room for confusion with the existing fix, but I think the right solution is an expansion of the docs to much more explicitly clarify the relationship between both sets of configurations, not a minor re-word.

SparkQA · 2019-05-22T08:36:33Z

Test build #105674 has finished for PR 24671 at commit d018b35.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2019-05-22T08:36:34Z

I think the proposed new description is inaccurate because interned strings and native overheads (e.g. Netty direct buffers) aren't allocated outside the executor process, per se (since off-heap memory is still in the JVM's own process address space).

Thanks for your review. Yes, I ignored the detail off-heap memory is still in the JVM's own process address space.

SparkQA · 2019-05-22T08:52:53Z

Test build #105676 has finished for PR 24671 at commit 554caa2.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-22T09:09:17Z

Test build #105677 has finished for PR 24671 at commit 2fdc94f.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-22T09:34:37Z

Test build #105679 has finished for PR 24671 at commit 2c89b42.

This patch fails Scala style tests.
This patch merges cleanly.
This patch adds no public classes.

dongjoon-hyun · 2019-05-22T10:14:40Z

Retest this please.

SparkQA · 2019-05-22T11:11:50Z

Test build #105682 has finished for PR 24671 at commit 3f79e89.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-22T11:50:01Z

Test build #105686 has finished for PR 24671 at commit 3f79e89.

This patch fails Spark unit tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2019-05-23T02:30:52Z

Retest this please.

SparkQA · 2019-05-23T04:52:10Z

Test build #105710 has finished for PR 24671 at commit 3f79e89.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer

review again!

core/src/main/scala/org/apache/spark/internal/config/package.scala

docs/configuration.md

beliefer · 2019-05-27T03:38:44Z

Retest this please.

SparkQA · 2019-05-27T05:30:46Z

Test build #105812 has finished for PR 24671 at commit c4ab8a9.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-27T05:42:34Z

Test build #105814 has finished for PR 24671 at commit f96f936.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

docs/configuration.md

SparkQA · 2019-05-28T04:26:49Z

Test build #105854 has finished for PR 24671 at commit 1f31fc6.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-29T04:46:41Z

Test build #105886 has finished for PR 24671 at commit f23c1b7.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

docs/configuration.md

SparkQA · 2019-05-30T04:41:39Z

Test build #105939 has finished for PR 24671 at commit 5dbbf4a.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

SparkQA · 2019-05-30T09:28:16Z

Test build #105950 has finished for PR 24671 at commit 7980652.

This patch passes all tests.
This patch merges cleanly.
This patch adds no public classes.

beliefer · 2019-05-31T02:28:20Z

@JoshRosen Could you review this PR again and find other issues?

srowen · 2019-06-01T13:20:29Z

Merged to master

beliefer · 2019-06-01T14:19:58Z

@srowen Thanks for your merger. I thought that two requested change need every one to agree.
@JoshRosen Thanks for your detailed review.

improve docs about overhead.

ef6c505

beliefer changed the title ~~[MINOR][DOCS]Improve docs about overhead.~~ [MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. May 22, 2019

JoshRosen requested changes May 22, 2019

View reviewed changes

expansion of the docs to clarify memoryOverhead.

d018b35

Fix Scala style.

554caa2

Fix Scala style.

2fdc94f

Fix Scala style.

2c89b42

Fix Scala style.

3f79e89

beliefer changed the title ~~[MINOR][DOCS]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead.~~ [SPARK-27811][Core][Docs]Improve docs about spark.driver.memoryOverhead and spark.executor.memoryOverhead. May 23, 2019

beliefer commented May 24, 2019

View reviewed changes

srowen reviewed May 26, 2019

View reviewed changes

core/src/main/scala/org/apache/spark/internal/config/package.scala Outdated Show resolved Hide resolved

docs/configuration.md Outdated Show resolved Hide resolved

beliefer added 2 commits May 27, 2019 11:31

revert comment in core and add pyspark extended comment in docs.

c4ab8a9

delete a blank.

f96f936

srowen requested changes May 27, 2019

View reviewed changes

docs/configuration.md Outdated Show resolved Hide resolved

docs/configuration.md Outdated Show resolved Hide resolved

docs/configuration.md Outdated Show resolved Hide resolved

correct some typo.

1f31fc6

Adjust docs.

f23c1b7

srowen requested changes May 29, 2019

View reviewed changes

docs/configuration.md Outdated Show resolved Hide resolved

docs/configuration.md Outdated Show resolved Hide resolved

revert This option ...

5dbbf4a

Add may need to.

7980652

srowen approved these changes May 30, 2019

View reviewed changes

srowen closed this in 8feb80a Jun 1, 2019

LuciferYang mentioned this pull request Aug 28, 2019

[SPARK-28577][YARN]Resource capability requested for each executor add offHeapMemorySize #25309

Closed

beliefer deleted the improve-docs-of-overhead branch December 6, 2019 10:08

Conversation

beliefer commented May 22, 2019

What changes were proposed in this pull request?

How was this patch tested?

Uh oh!

SparkQA commented May 22, 2019

Uh oh!

JoshRosen left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

SparkQA commented May 22, 2019

Uh oh!

beliefer commented May 22, 2019

Uh oh!

SparkQA commented May 22, 2019

Uh oh!

SparkQA commented May 22, 2019

Uh oh!

SparkQA commented May 22, 2019

Uh oh!

dongjoon-hyun commented May 22, 2019

Uh oh!

SparkQA commented May 22, 2019

Uh oh!

SparkQA commented May 22, 2019

Uh oh!

beliefer commented May 23, 2019

Uh oh!

SparkQA commented May 23, 2019

Uh oh!

beliefer left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

beliefer commented May 27, 2019

Uh oh!

SparkQA commented May 27, 2019

Uh oh!

SparkQA commented May 27, 2019

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SparkQA commented May 28, 2019

Uh oh!

SparkQA commented May 29, 2019

Uh oh!

Uh oh!

Uh oh!

SparkQA commented May 30, 2019

Uh oh!

SparkQA commented May 30, 2019

Uh oh!

beliefer commented May 31, 2019

Uh oh!

srowen commented Jun 1, 2019

Uh oh!

beliefer commented Jun 1, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

JoshRosen left a comment •

edited

Loading

beliefer commented Jun 1, 2019 •

edited

Loading